Australian Geoscience Datacube API

This notebook describes connecting to the datacube and doing a basic query


In [1]:
import datacube.api
from pprint import pprint

By default, the API will use the configured database connection found in the config file.

Details on setting up the config file and database and be found here: http://agdc-v2.readthedocs.org/en/develop/db_setup.html


In [2]:
dc = datacube.api.API()

Summary functions

  • list_fields() - lists all fields that can be used for searching
  • list_field_values(field) - lists all the values of the field found in the database

Find out what fields we can search:


In [3]:
dc.list_fields()


Out[3]:
dict_keys(['gsi', 'sat_path', 'product', 'id', 'collection', 'time', 'platform', 'lat', 'orbit', 'sat_row', 'lon', 'instrument'])

The product and platform fields looks interesting. Find out more about them:


In [4]:
dc.list_field_values('product')


Out[4]:
['gamma0']

In [5]:
dc.list_field_values('platform')


Out[5]:
['SENTINEL_1A', 'ALOS_2']

Query and Access functions

There are several API calls the describe and provide data in different ways:

  • get_descriptor() - provides a descripton of the data for a given query
  • get_data() - provides the data as xarray.DataArrays for each variable. This is usually called based on information returned by the get_descriptor call.
  • get_data_array() - returns an xarray.DataArray n-dimensional object, with the variables stack along the dimension labelled variables.
  • get_dataset() - return an xarray.Dataset object, containing an xarray.DataArray for each variable.

get_descriptor

We can make a query and find out about the data:

The query is a nested dict of variables of terms.


In [6]:
query = {
    'product': 'gamma0',
    'platform': ['ALOS_2','SENTINEL_1A'],
}
descriptor = dc.get_descriptor(query, include_storage_units=False)
pprint(descriptor)


{'alos2_gamma0_albers': {'dimensions': ['time', 'y', 'x'],
                         'irregular_indices': {'time': array(['2016-03-02T10:59:59.000000000+1100'], dtype='datetime64[ns]')},
                         'result_max': (numpy.datetime64('2016-03-02T10:59:59.000000000+1100'),
                                        -4300006.25,
                                        1499993.75),
                         'result_min': (numpy.datetime64('2016-03-02T10:59:59.000000000+1100'),
                                        -4899993.75,
                                        900006.25),
                         'result_shape': (1, 48000, 48000),
                         'variables': {'hh_gamma0': {'datatype_name': dtype('float32'),
                                                     'nodata_value': 0},
                                       'hv_gamma0': {'datatype_name': dtype('float32'),
                                                     'nodata_value': 0}}},
 's1_gamma0_albers': {'dimensions': ['time', 'y', 'x'],
                      'irregular_indices': {'time': array(['2016-03-02T10:59:59.000000000+1100'], dtype='datetime64[ns]')},
                      'result_max': (numpy.datetime64('2016-03-02T10:59:59.000000000+1100'),
                                     -4300006.25,
                                     1499993.75),
                      'result_min': (numpy.datetime64('2016-03-02T10:59:59.000000000+1100'),
                                     -4899993.75,
                                     900006.25),
                      'result_shape': (1, 48000, 48000),
                      'variables': {'vh_gamma0': {'datatype_name': dtype('float32'),
                                                  'nodata_value': 0},
                                    'vv_gamma0': {'datatype_name': dtype('float32'),
                                                  'nodata_value': 0}}}}

The query can be restricted to provide information on particular range along a dimension.

For spatial queries, the dimension names should be used. The default projection for the range query values is in WGS84, although


In [7]:
query = {
    'product': 'gamma0',
    'platform': ['ALOS_2','SENTINEL_1A'],
    'dimensions': {
        'x' : {
            'range': (146.0, 147.0),
        },
        'y' : {
            'range': (-42.0, -41.0),
        },
        'time': {
            'range': ((2015, 1, 1), (2017, 1 ,2)),
        }
    }
}
pprint(dc.get_descriptor(query, include_storage_units=False))


{'alos2_gamma0_albers': {'dimensions': ['time', 'y', 'x'],
                         'irregular_indices': {'time': array(['2016-03-02T10:59:59.000000000+1100'], dtype='datetime64[ns]')},
                         'result_max': (numpy.datetime64('2016-03-02T10:59:59.000000000+1100'),
                                        -4548968.75,
                                        1284918.75),
                         'result_min': (numpy.datetime64('2016-03-02T10:59:59.000000000+1100'),
                                        -4666481.25,
                                        1187756.25),
                         'result_shape': (1, 9402, 7774),
                         'variables': {'hh_gamma0': {'datatype_name': dtype('float32'),
                                                     'nodata_value': 0},
                                       'hv_gamma0': {'datatype_name': dtype('float32'),
                                                     'nodata_value': 0}}},
 's1_gamma0_albers': {'dimensions': ['time', 'y', 'x'],
                      'irregular_indices': {'time': array(['2016-03-02T10:59:59.000000000+1100'], dtype='datetime64[ns]')},
                      'result_max': (numpy.datetime64('2016-03-02T10:59:59.000000000+1100'),
                                     -4548968.75,
                                     1284918.75),
                      'result_min': (numpy.datetime64('2016-03-02T10:59:59.000000000+1100'),
                                     -4666481.25,
                                     1187756.25),
                      'result_shape': (1, 9402, 7774),
                      'variables': {'vh_gamma0': {'datatype_name': dtype('float32'),
                                                  'nodata_value': 0},
                                    'vv_gamma0': {'datatype_name': dtype('float32'),
                                                  'nodata_value': 0}}}}

A coordinate reference sytsem can be provided for the spatial dimensions, either as a EPSG code or a WKT description:


In [8]:
query = {
    'product': 'gamma0',
    'platform': ['ALOS_2','SENTINEL_1A'],
    'dimensions': {
        'x' : {
            'range': (1187756.25, 1284918.75),
            'crs': 'EPSG:3577',
        },
        'y' : {
            'range': (-4666481.25,-4548968.75),
            'crs': 'EPSG:3577',
        },
        'time': {
            'range': ((2016, 1, 1), (2017, 1 ,1)),
        }
    }
}

get_data

This retrieves the data, usually as a subset, based on the information provided by the get_descriptor call.

The query is in a similar form to the get_descriptor call, with the addition of a variables parameter. If not specified, all variables are returned. The query also accepts an array_range parameter on a dimension that provides a subset based on array indicies, rather than labelled coordinates.


In [10]:
query = {
    'product': 'gamma0',
    'platform': 'ALOS_2',
    'variables': ['hh_gamma0', 'hv_gamma0'],
    'dimensions': {
        'x' : {
            'range': (146, 147),
            'array_range': (0, 1),
        },
        'y' : {
            'range': (-41, -42),
            'array_range': (0, 1),
        },
        'time': {
            'range': ((2016, 1, 1), (2017, 1, 1))
        }
    }
}
data = dc.get_data(query)
data.keys()


Out[10]:
dict_keys(['indices', 'size', 'arrays', 'dimensions', 'element_sizes', 'coordinate_reference_systems'])

get_data_array

This is a convinence function that wraps the get_data function, returning only the data, stacked in a single xarray.DataArray.

The variables are stacked along the variable dimension.


In [12]:
alos2 = dc.get_data_array(product='gamma0', platform='ALOS_2', y=(-41,-42), x=(146,147))
s1a = dc.get_data_array(product='gamma0', platform='SENTINEL_1A', y=(-41,-42), x=(146,147))

get_dataset

This is a convenience fuction similar to get_data_array, returning the data of the query as a xarray.Dataset object.


In [15]:
dc.get_dataset(product='gamma0', platform='SENTINEL_1A', y=(-41,-42), x=(146,147))


Out[15]:
<xarray.Dataset>
Dimensions:    (time: 1, x: 7774, y: 9402)
Coordinates:
  * time       (time) datetime64[ns] 2016-03-01T23:59:59
  * y          (y) float64 -4.549e+06 -4.549e+06 -4.549e+06 -4.549e+06 ...
  * x          (x) float64 1.188e+06 1.188e+06 1.188e+06 1.188e+06 1.188e+06 ...
Data variables:
    crs        int32 0
    vh_gamma0  (time, y, x) float32 nan nan nan nan nan nan nan nan nan nan ...
    vv_gamma0  (time, y, x) float32 nan nan nan nan nan nan nan nan nan nan ...
Attributes:
    title: Experimental Data files From the Australian Geoscience Data Cube - DO NOT USE
    license: Creative Commons Attribution 4.0 International CC BY 4.0
    product_version: 0.0.0
    source: This data is a reprojection and retile of Landsat surface reflectance data from the USGS
    summary: These files are experimental, short lived, and the format will change.

In [ ]: